Automatic annotation for mapping discovery in data integration systems

نویسندگان

  • Sonia Bergamaschi
  • Laura Po
  • Serena Sorrentino
چکیده

We propose a CWSD (Combined Word Sense Disambiguation) algorithm for the automatic annotation of structured and semi-structured data sources. Rather than being targeted to textual data sources like most of the traditional WSD algorithms found in the literature, our algorithm can exploit information coming from the structure of the sources together with the lexical knowledge associated with the terms (elements of the schemata). We integrated CWSD in the MOMIS system (Mediator EnvirOment for Multiple Information Sources) [1], which is an I3 framework designed for the integration of data sources, where the lexical annotation of terms was performed manually by the user. CWSD combines a structural disambiguation algorithm that starts the disambiguation of the terms using the semantic relationships extracted from the schemata structural relationships with a WordNet Domains based disambiguation algorithm to re ne terms disambiguation by using domains information. Structural relationships are stored in a Common Thesaurus (CT) generated by the MOMIS system. The CT is a set of relationships describing interand intra-schema knowledge among the source schemas. From a source schema we extract the following relationships: SYN (Synonym-of), de ned between two terms (term is the name of a class/attribute of a schema) that are considered synonyms/equivalent; BT (Broader Terms), de ned between two terms such as the rst one is more general than the second one (the opposite of BT is NT, Narrower Terms); RT (Related Terms) de ned between two terms that are generally used together in the same context. The extracted ODLI3 relationships can be used in the disambiguation process according to a lexical database (in our approach we used WordNet). The algorithm tries to nd a lexical relationship when a CT relationship exists among two terms; in this case we choose the meanings connected by this relationship as the correct ones to disambiguate the terms. The same holds if we nd a chain of lexical relationships that connect terms meanings. The WordNet Domains disambiguation algorithm exploits the information from WordNet Domains. WordNet Domains [2] can be considered an extended version of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dealing with Uncertainty in Lexical Annotation

We present ALA, a tool for the automatic lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) of structured and semi-structured data sources and the discovery of probabilistic lexical relationships in a data integration environment. ALA performs automatic lexical annotation through the use of probabilistic annotations, i.e. an annotation is associated to a probability value....

متن کامل

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

A CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images

Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...

متن کامل

Automatic annotation for mappings discovery in data integration systems

In this article we present CWSD (Combined Word Sense Disambiguation) a method and a software tool for enabling automatic lexical annotation of local (structured and semi-structured) data sources in a data integration system. CWSD is based on the exploitation of WordNet Domains and the lexical and structural knowledge of the data sources. The method extends the semi-automatic lexical annotation ...

متن کامل

Uncertainty in data integration systems: automatic generation of probabilistic relationships

We propose a method for the automatic discovery of probabilistic relationships in the environment of data integration systems. Dynamic data integration systems extend the architecture of current data integration systems by modeling uncertainty at their core. Our method is a probabilistic word sense disambiguation (PWSD), which allows to automatically lexically annotate (i.e. annotation w.r.t. a...

متن کامل

Fuzzy Neighbor Voting for Automatic Image Annotation

With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008